Monocular depth estimation has been actively studied in fields such as robot vision, autonomous driving, and 3D scene understanding. Given a sequence of color images, unsupervised learning methods based on the framework of Structure-From-Motion (SfM) simultaneously predict depth and camera relative pose. However, dynamically moving objects in the scene violate the static world assumption, resulting in inaccurate depths of dynamic objects. In this work, we propose a new method to address such dynamic object movements through monocular 3D object detection. Specifically, we first detect 3D objects in the images and build the per-pixel correspondence of the dynamic pixels with the detected object pose while leaving the static pixels corresponding to the rigid background to be modeled with camera motion. In this way, the depth of every pixel can be learned via a meaningful geometry model. Besides, objects are detected as cuboids with absolute scale, which is used to eliminate the scale ambiguity problem inherent in monocular vision. Experiments on the KITTI depth dataset show that our method achieves State-of-The-Art performance for depth estimation. Furthermore, joint training of depth, camera motion and object pose also improves monocular 3D object detection performance. To the best of our knowledge, this is the first work that allows a monocular 3D object detection network to be fine-tuned in a self-supervised manner.
translated by 谷歌翻译
在本文中,我们提出了一种基于统计机器学习的采样算法,以获得有条件的非线性最佳扰动(CNOP),该算法与传统的确定性优化方法基本不同。新方法不仅可以直接通过客观值(Zeroth-order)信息来减少极其昂贵的梯度(一阶)信息,而且还避免使用伴随技术,从而引起巨大的存储问题和线性化的不稳定性。同时,显示了通过采样的近似梯度的直观厌食和严格的浓度不等式。通过对理论模型的标准空间坚固性的性能,具有较小粘度的Burgers方程来获得CNOP的数值实验表明,以损失准确性为代价,较少的样本比基于邻接的方法相对短,而直接从定义中短。 。最后,我们揭示了所有算法获得的CNOP的非线性时间演变几乎与扰动的Norm Square的数量一致,其差异和相对差异基于定义方法。
translated by 谷歌翻译
由于肿瘤的异质性,在个性化的基础上预测抗癌药物的临床结局在癌症治疗中具有挑战性。已经采取了传统的计算努力来建模药物反应对通过其分子概况描绘的单个样品的影响,但由于OMICS数据的高维度而发生过度拟合,因此阻碍了临床应用的模型。最近的研究表明,深度学习是通过学习药物和样品之间的学习对准模式来建立药物反应模型的一种有前途的方法。但是,现有研究采用了简单的特征融合策略,仅考虑了整个药物特征,同时忽略了在对齐药物和基因时可能起着至关重要的作用的亚基信息。特此在本文中,我们提出了TCR(基于变压器的癌症药物反应网络),以预测抗癌药物反应。通过利用注意机制,TCR能够在我们的研究中有效地学习药物原子/子结构和分子特征之间的相互作用。此外,设计了双重损耗函数和交叉抽样策略,以提高TCR的预测能力。我们表明,TCR在所有评估矩阵上(一些具有显着改进)的各种数据分裂策略下优于所有其他方法。广泛的实验表明,TCR在独立的体外实验和体内实际患者数据上显示出显着提高的概括能力。我们的研究强调了TCR的预测能力及其对癌症药物再利用和精度肿瘤治疗的潜在价值。
translated by 谷歌翻译
Optical flow estimation is a classical yet challenging task in computer vision. One of the essential factors in accurately predicting optical flow is to alleviate occlusions between frames. However, it is still a thorny problem for current top-performing optical flow estimation methods due to insufficient local evidence to model occluded areas. In this paper, we propose the Super Kernel Flow Network (SKFlow), a CNN architecture to ameliorate the impacts of occlusions on optical flow estimation. SKFlow benefits from the super kernels which bring enlarged receptive fields to complement the absent matching information and recover the occluded motions. We present efficient super kernel designs by utilizing conical connections and hybrid depth-wise convolutions. Extensive experiments demonstrate the effectiveness of SKFlow on multiple benchmarks, especially in the occluded areas. Without pre-trained backbones on ImageNet and with a modest increase in computation, SKFlow achieves compelling performance and ranks $\textbf{1st}$ among currently published methods on the Sintel benchmark. On the challenging Sintel clean and final passes (test), SKFlow surpasses the best-published result in the unmatched areas ($7.96$ and $12.50$) by $9.09\%$ and $7.92\%$. The code is available at \href{https://github.com/littlespray/SKFlow}{https://github.com/littlespray/SKFlow}.
translated by 谷歌翻译
编码器 - 解码器模型已广泛用于RGBD语义分割,并且大多数通过双流网络设计。通常,共同推理RGBD的颜色和几何信息是有益的对语义分割。然而,大多数现有方法都无法全面地利用编码器和解码器中的多模式信息。在本文中,我们提出了一种用于RGBD语义细分的新型关注的双重监督解码器。在编码器中,我们设计一个简单但有效的关注的多模式融合模块,以提取和保险丝深度多级成对的互补信息。要了解更强大的深度表示和丰富的多模态信息,我们介绍了一个双分支解码器,以有效利用不同任务的相关性和互补线。在Nyudv2和Sun-RGBD数据集上的广泛实验表明,我们的方法达到了最先进的方法的卓越性能。
translated by 谷歌翻译
Long document retrieval aims to fetch query-relevant documents from a large-scale collection, where knowledge distillation has become de facto to improve a retriever by mimicking a heterogeneous yet powerful cross-encoder. However, in contrast to passages or sentences, retrieval on long documents suffers from the scope hypothesis that a long document may cover multiple topics. This maximizes their structure heterogeneity and poses a granular-mismatch issue, leading to an inferior distillation efficacy. In this work, we propose a new learning framework, fine-grained distillation (FGD), for long-document retrievers. While preserving the conventional dense retrieval paradigm, it first produces global-consistent representations crossing different fine granularity and then applies multi-granular aligned distillation merely during training. In experiments, we evaluate our framework on two long-document retrieval benchmarks, which show state-of-the-art performance.
translated by 谷歌翻译
An enhanced geothermal system is essential to provide sustainable and long-term geothermal energy supplies and reduce carbon emissions. Optimal well-control scheme for effective heat extraction and improved heat sweep efficiency plays a significant role in geothermal development. However, the optimization performance of most existing optimization algorithms deteriorates as dimension increases. To solve this issue, a novel surrogate-assisted level-based learning evolutionary search algorithm (SLLES) is proposed for heat extraction optimization of enhanced geothermal system. SLLES consists of classifier-assisted level-based learning pre-screen part and local evolutionary search part. The cooperation of the two parts has realized the balance between the exploration and exploitation during the optimization process. After iteratively sampling from the design space, the robustness and effectiveness of the algorithm are proven to be improved significantly. To the best of our knowledge, the proposed algorithm holds state-of-the-art simulation-involved optimization framework. Comparative experiments have been conducted on benchmark functions, a two-dimensional fractured reservoir and a three-dimensional enhanced geothermal system. The proposed algorithm outperforms other five state-of-the-art surrogate-assisted algorithms on all selected benchmark functions. The results on the two heat extraction cases also demonstrate that SLLES can achieve superior optimization performance compared with traditional evolutionary algorithm and other surrogate-assisted algorithms. This work lays a solid basis for efficient geothermal extraction of enhanced geothermal system and sheds light on the model management strategies of data-driven optimization in the areas of energy exploitation.
translated by 谷歌翻译
Facial Expression Recognition (FER) in the wild is an extremely challenging task. Recently, some Vision Transformers (ViT) have been explored for FER, but most of them perform inferiorly compared to Convolutional Neural Networks (CNN). This is mainly because the new proposed modules are difficult to converge well from scratch due to lacking inductive bias and easy to focus on the occlusion and noisy areas. TransFER, a representative transformer-based method for FER, alleviates this with multi-branch attention dropping but brings excessive computations. On the contrary, we present two attentive pooling (AP) modules to pool noisy features directly. The AP modules include Attentive Patch Pooling (APP) and Attentive Token Pooling (ATP). They aim to guide the model to emphasize the most discriminative features while reducing the impacts of less relevant features. The proposed APP is employed to select the most informative patches on CNN features, and ATP discards unimportant tokens in ViT. Being simple to implement and without learnable parameters, the APP and ATP intuitively reduce the computational cost while boosting the performance by ONLY pursuing the most discriminative features. Qualitative results demonstrate the motivations and effectiveness of our attentive poolings. Besides, quantitative results on six in-the-wild datasets outperform other state-of-the-art methods.
translated by 谷歌翻译
With the success of the prompt-tuning paradigm in Natural Language Processing (NLP), various prompt templates have been proposed to further stimulate specific knowledge for serving downstream tasks, e.g., machine translation, text generation, relation extraction, and so on. Existing prompt templates are mainly shared among all training samples with the information of task description. However, training samples are quite diverse. The sharing task description is unable to stimulate the unique task-related information in each training sample, especially for tasks with the finite-label space. To exploit the unique task-related information, we imitate the human decision process which aims to find the contrastive attributes between the objective factual and their potential counterfactuals. Thus, we propose the \textbf{C}ounterfactual \textbf{C}ontrastive \textbf{Prompt}-Tuning (CCPrompt) approach for many-class classification, e.g., relation classification, topic classification, and entity typing. Compared with simple classification tasks, these tasks have more complex finite-label spaces and are more rigorous for prompts. First of all, we prune the finite label space to construct fact-counterfactual pairs. Then, we exploit the contrastive attributes by projecting training instances onto every fact-counterfactual pair. We further set up global prototypes corresponding with all contrastive attributes for selecting valid contrastive attributes as additional tokens in the prompt template. Finally, a simple Siamese representation learning is employed to enhance the robustness of the model. We conduct experiments on relation classification, topic classification, and entity typing tasks in both fully supervised setting and few-shot setting. The results indicate that our model outperforms former baselines.
translated by 谷歌翻译
Proper functioning of connected and automated vehicles (CAVs) is crucial for the safety and efficiency of future intelligent transport systems. Meanwhile, transitioning to fully autonomous driving requires a long period of mixed autonomy traffic, including both CAVs and human-driven vehicles. Thus, collaboration decision-making for CAVs is essential to generate appropriate driving behaviors to enhance the safety and efficiency of mixed autonomy traffic. In recent years, deep reinforcement learning (DRL) has been widely used in solving decision-making problems. However, the existing DRL-based methods have been mainly focused on solving the decision-making of a single CAV. Using the existing DRL-based methods in mixed autonomy traffic cannot accurately represent the mutual effects of vehicles and model dynamic traffic environments. To address these shortcomings, this article proposes a graph reinforcement learning (GRL) approach for multi-agent decision-making of CAVs in mixed autonomy traffic. First, a generic and modular GRL framework is designed. Then, a systematic review of DRL and GRL methods is presented, focusing on the problems addressed in recent research. Moreover, a comparative study on different GRL methods is further proposed based on the designed framework to verify the effectiveness of GRL methods. Results show that the GRL methods can well optimize the performance of multi-agent decision-making for CAVs in mixed autonomy traffic compared to the DRL methods. Finally, challenges and future research directions are summarized. This study can provide a valuable research reference for solving the multi-agent decision-making problems of CAVs in mixed autonomy traffic and can promote the implementation of GRL-based methods into intelligent transportation systems. The source code of our work can be found at https://github.com/Jacklinkk/Graph_CAVs.
translated by 谷歌翻译